Association of the rs8720 and rs12587 KRAS Gene Variants with Colorectal Cancer in a Mexican Population and Their Analysis In Silico

Colorectal cancer (CRC) is a major global health challenge and one of the top 10 cancers in Mexico. Lifestyle and genetic factors influence CRC development, prognosis, and therapeutic response; identifying risk factors, such as the genes involved, is critical to understanding its behavior, mechanisms, and prognosis. The association between KRAS gene variants (rs8720 and rs12587) and CRC in the Mexican population was analyzed. We performed in silico analysis and analyzed 310 healthy individuals and 385 CRC patients using TaqMan assays and real-time PCR. The CC and GG genotypes of rs8720 and rs12587 were identified as CRC risk factors (p < 0.05). The CC and TC genotypes of the rs8720 were associated with rectal cancer, age over 50 years, moderately differentiated histology, and advanced cancer stage. TG and GG genotypes of the rs12587 variant were a risk factor in the CRC group, in patients with stage I–II, males, and stage III–IV non-chemotherapy response. The TG haplotype is protected against CRC. The combined CCGG genotype was linked to CRC risk. In silico analysis revealed that the rs12587 and rs8720 variants could influence KRAS gene regulation via miRNAs. In conclusion, rs8720 and rs12587 variants of the KRAS gene were associated with CRC risk and could influence KRAS regulation via miRNAs.


Introduction
CRC is defined as the uncontrolled hyperproliferation of glandular epithelial cells located in the colon and rectum due to genetic or epigenetic changes, which allows for the gradual formation of a benign adenoma, which can become cancerous and metastasize, via molecular mechanisms such as microsatellite and chromosomal instability and serrated neoplasia [1][2][3]. CRC represents the most frequent neoplasm of the digestive tract, and, according to the latest epidemiological data provided by the Global Cancer Observatory (GLOBOCAN), it is the third most frequently diagnosed neoplasm and has the second highest mortality rate [4]. Although the classic risk factors for developing this disease are lifestyle, diet, family history, and chronic inflammation, CRC is a multifactorial and highly heterogeneous disease, and genetic and environmental factors play a large role in its appearance, development, and progression. Therefore, these factors that determine the risk of developing the three known types of CRC: sporadic, hereditary, and colitis-associated [1]. The initiation, promotion, and tumor progression of this neoplasm are determined by the presence of irreversible damage to the genetic material of the epithelial cells of the colon, which promotes deregulation in various molecular signaling pathways, inducing these cells to appear abnormal. These modifications include pathways such as MAPK, Pi3K/Akt, Hedgehog, ErbB, JNK, and BMP, among others [1][2][3].
In some of these deregulated pathways in CRC, a protein of great importance in cell signaling is the KRAS protein, which is encoded by homonymous gene participates. KRAS is part of a RAS-dominated family of oncogenes; this is a series of proteins with GTPase activity. Their relationship has been demonstrated in the appearance of up to 25% of various types of human cancers, 85% of which correspond to genetic alterations present in KRAS. These alterations have been found in up to 98% of pancreatic ductal adenoma cases, 52% of CRC cases, and approximately 30% of lung adenocarcinomas [5,6].
The most frequent alterations found in KRAS are in codons 12, 13, 59, or 61, of which 97% correspond to changes or alterations to codon 12. In addition, it has been shown that KRAS alterations in CRC are associated with poor prognosis and resistance to treatment [7,8].
Even before the start of pharmacological therapy in metastatic cases, it is common in clinical practice to routinely perform a test for alterations in KRAS, since the presence of alterations in this gene also implies the provision of targeted and precise therapies [7,8].
In addition to the typical genetic alterations in KRAS, it has recently been shown that genetic variants, mainly single-nucleotide genetic variants (SNVs) located at microRNA (miRNA) binding sites in the 3 UTR region of KRAS, play an important role in the regulation of this gene; therefore, they could be associated in various ways with tumor promotion [9]. During the splicing process, the 3 UTR region is not eliminated, and this is a part of mature mRNA located downstream from the last exon. It remains unaffected by the splicing process. Therefore, the variants analyzed in this study are present in mature mRNA and may play an important role in gene expression regulation via interactions with miRNAs [10]. Several studies have shown the association between SNVs present in the 3 UTR region of the KRAS gene and various types of cancer, including breast cancer, Wilms tumors, colorectal cancer, and glioma, among others [11][12][13][14][15][16][17]. According to the NIH SNP database, SNV rs12587 is located at chr12:25205894 (GRCh38.p14) and represents T > G transversion, whereas rs8720 is located at chr12:25206009 (GRCh38.p13) and entails the transition T > C. On the one hand, SNV rs12587 has been associated with Wilms tumor [12] and glioma [18], but it has not been associated with CRC [13]. Meanwhile, the rs8720 variant has been associated with being a risk factor for developing CRC in the Chinese population [19].
In this study, we experimentally and in silico analyzed two SNVs, rs12587 and rs8720, both located in the 3 UTR region of the KRAS gene, a crucial regulatory region for gene expression. Numerous studies indicate that the post-transcriptional regulation of KRAS is mediated in this region by various miRNAs, suggesting that these variants may potentially interfere with its regulation [9,12,14,15]. These variants have been studied and associated with different types of cancer in diverse populations but not in the Mexican population, where studies on the association between KRAS gene variants and colorectal cancer are limited.

Experimental Subjects
The study was carried out at the Centro de Investigación Biomédica de Occidente, Instituto Mexicano del Seguro Social and was approved by the local ethics committee (CLIES #1305) with the registration number R-2022-1305-081. All the procedures performed in the study were in accordance with the 1964 Declaration of Helsinki, and the participants provided written information. In all, 385 CRC patients with clinically and histologically confirmed CRC and 310 controls were included.
These databases are widely used in combination with miRNA target prediction. The targets were evaluated using bioinformatics tools that analyze massive quantities of sequencing data. Machine learning methodologies and algorithms are employed to predict the potential target genes of miRNAs. Additionally, prediction tools were utilized to analyze how SNVs can modify miRNA binding sites, potentially affecting miRNA-mRNA interaction. PolymiRTS works by integrating annotation data from databases such as UCSC and other sites focused on miRNA analysis and study, as well as incorporating large-scale experiments such as GWAS and CLASH. On the other hand, miRNASNP examines the gain or loss of miRNA target sites based on different alleles of the variants. To achieve this, the software utilizes target prediction tools such as TargetScan and miRmap. These computational tools are used to predict and evaluate potential miRNA targets while considering specific allelic variations. By leveraging these prediction algorithms, the software assesses the impact of genetic variants on miRNA target selection patterns and uncover their potential impact on gene expression regulation. In addition, miRDB analyzes and predicts miRNA target sites in the 3 UTR region of mRNA by integrating a machine learning model based on support vector machines (SVMs) and high-throughput training datasets. Furthermore, miRTarBase operates by analyzing miRNA-Target Interactions (MITs) and subsequently seeking their validation via various experimental methods [PolymiRTS Database version 3.0; https://compbio.uthsc.edu/miRSNP/; miRNA SNP ver-sion 3 http://bioinfo.life.hust.edu.cn/miRNASNP; MicroRNA Target Prediction Database, miRDB; https://mirdb.org, and miRTarBase version 9.0, https://mirtarbase.cuhk.edu.cn/ miRTarBase/miRTarBase_2022/php/index.php; accessed on 20 June 2023].

Co-Expression Analysis
Subsequently, a co-expression analysis of the previously filtered miRNAs was performed using the DeepMap Portal (Broad Institute; https://depmap.org/portal/; accesed on 20 June 2023), a valuable resource providing open access to analytical tools and gene expression databases. The tools for miRNA-mRNA binding prediction were utilized to investigate co-expression patterns. For the miRNAs and KRAS mRNA, normalized log2 values (relative to ploidy + 1) from the Copy Number Public 23Q2 consortium were employed. This analysis focused on samples from colorectal adenocarcinoma tumors.

KRAS Gene Expression Levels
In this study, in silico analyses were conducted to investigate the average gene expression levels of KRAS in colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ) samples. The analysis was performed using the Gene Expression Profiling Interactive Analysis (GEPIA) tool (http://gepia.cancer-pku.cn; accessed on 20 June 2023), which integrates and analyzes data from the Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) repositories. For this analysis, average expression data of KRAS, previously normalized to the logarithmic scale log2 (TPM + 1), were compared between the 275 tumor samples (COAD) and 349 normal samples as well as 92 tumor tissue samples (READ) and 318 healthy tissue samples. A significance threshold of p < 0.01 was set.
A comparative analysis was conducted to examine the average expression levels of KRAS, segregated by tumor stage, in the COAD and READ samples. Additionally, an analysis of overall survival and disease-free survival was performed for both types of neoplasms.

Statistical Analysis
The frequencies of the clinicopathologic variables, genotypes, and alleles were expressed using percentages. The genotypes observed and expected from the control group were compared using the chi-square test for calculating the Hardy-Weinberg equilibrium (HWE). The genotype association was analyzed using odds ratios and binary logistic regression in SPSS Statistic Base 24 (Chicago, IL, USA). The SHEsis online version of the program was used to analyze pairwise linkage disequilibrium (D') and haplotype frequency [20]. Kaplan-Meier analysis was utilized for survival analysis in silico. A significance threshold of p =< 0.01 was set.

Clinical and Demographic Characteristics of the Study Groups
The demographic variables of the study groups are described in Table 1. The average age in the group of patients with CRC was 59.86 ± 11.65 years; in the control group, the average age was 59.19 ± 13.56. The distribution by sex did not show statistically significant differences (p > 0.05); in the CRC group, 55% were men compared to 51% in the control group. There were no significant differences in the variables of tobacco and alcohol consumption (p > 0.05).

Haplotypes Analysis of rs8720 and rs12587 Variants of the KRAS Gene in the Studies Groups
The comparisons between the studied groups showed statistically significant differences in terms of haplotype frequency: TG (OR 0.50, 95% CI 0.37-0.68, p = 0.0001) ( Table 5). The linkage disequilibrium of the vrs8720 and rs12587 variants showed D' 0.35 and r' = 0.11 (p = 0.0001) in the control group.

Genotype Combination Analysis of the rs8720 and rs12587 Variants of the KRAS Gene in CRC and Control Groups
The genotype combinations from the CRC and control group were found to be statistically different: CCGG (OR 2.4, 95% CI 1.46-3.9, p = 0.0005), TCGG (OR 0.30, 95% CI 0.18-0.54, p = 0.0002), and TTTG (OR 0.50, 95% CI 0.33-0.96, p = 0.049) ( Table 6). They did not show statistically significant differences in terms of the demographic and clinical pathological characteristics of the group of patients with CRC. The frequencies of the C (rs8720) and G (rs12587) alleles of the KRAS gene variants in the control group were statistically different when compared with groups from different world populations (p < 0.05) except for the admixed Ashkenazi Jewish and Latino group (Figure 1). Allelic frequencies from other populations were taken from Ensembl, consulted in May 2023 (https://www.ensembl.org/Multi/Search/Results?q=rs8720;site=ensembl_all and https://www.ensembl.org/Multi/Search/Results?q=rs12587;site=ensembl_all; accessed on 20 June 2023).

miRNAs Targeting the Genomic Regions of SNPs rs8720 and rs12587
Using in silico tools, filtrations were performed to identify miRNAs whose binding sites could potentially be modified by the presence of the alleles of the rs8720 and rs12587 variants. It was observed that the C allele of rs8720 allows for binding to three different miRNAs, while the T allele shows affinity with four of them, including one previously validated experimentally. On the other hand, it was observed that the G allele of rs12587 promotes binding to six distinct miRNAs, while the T allele only binds to one of them (has-miR-4328), which was experimentally (has-miR-4328) validated among the filtered miRNAs ( Figure 2). of the other populations were taken from https://www.ensembl.org/Multi/Search/Results?q=rs8720;site=ensembl_all and https://www.ensembl.org/Multi/Search/Results?q=rs12587;site=ensembl_all; accessed on 20 June 2023.

miRNAs Targeting the Genomic Regions of SNPs rs8720 and rs12587
Using in silico tools, filtrations were performed to identify miRNAs whose binding sites could potentially be modified by the presence of the alleles of the rs8720 and rs12587 variants. It was observed that the C allele of rs8720 allows for binding to three different miRNAs, while the T allele shows affinity with four of them, including one previously validated experimentally. On the other hand, it was observed that the G allele of rs12587 promotes binding to six distinct miRNAs, while the T allele only binds to one of them (has-miR-4328), which was experimentally (has-miR-4328) validated among the filtered miRNAs ( Figure 2).

Analysis of miRNA/mRNA KRAS Expression Profiles
In the analysis performed using the DeepMap portal, we observed the expression profiles of each miRNA interacting with the rs8720 and rs12587 variants as well as the expression profile of KRAS mRNA in the colorectal adenocarcinoma tissues ( Figure S1: Expression analysis of KRAS miRNA). The expression analyses are shown in Figure S1, and the correlation values are presented in Table 7.

Analysis of miRNA/mRNA KRAS Expression Profiles
In the analysis performed using the DeepMap portal, we observed the expression profiles of each miRNA interacting with the rs8720 and rs12587 variants as well as the expression profile of KRAS mRNA in the colorectal adenocarcinoma tissues ( Figure S1: Expression analysis of KRAS miRNA). The expression analyses are shown in Figure S1, and the correlation values are presented in Table 7.     From the miRTARBase database (https://mirtarbase.cuhk.edu.cn/~miRTarBase/miR-TarBase_2022/php/index.php; accessed on 20 June 2023), a list of genes potentially regulated by hsa-miR-4328 was accessed. Additionally, using the DAVID Bioinformatics resources database (https://david.ncifcrf.gov/home.jsp; accessed on 20 June 2023), a pathway of the genes regulated by this miRNA that are involved in cellular proliferation pathways, apoptosis, and other cancer-related pathways was reviewed (Figure 2).

Discussion
In Mexico, CRC is the third most common cause of cancer among the public and has the second most common mortality rate in both men and women [4,21]. Its highest frequency has been observed in subjects from approximately 50 years of age, reaching a maximum peak at 88 years of age [21]. This is consistent with the average age data in this study. However, different risk factors related to the presence of CRC in this respect have been noted; we observed a frequency with a similar proportion of colon and rectal type, a high frequency of moderately differentiated adenocarcinoma histology, and advanced stage III.
In the process of cellular proliferation and differentiation, the participation of different cellular signaling pathways has been observed, and one of these pathways of interest is the KRAS signaling pathway. It is known that the activation of KRAS stimulates the participation of signaling pathways MAP kinase and PI3K-AKT-mTOR as well as the pathways that participate in invasion and metastasis (TIAM1-RAC and RAL). Furthermore, different frequencies of mutations in the KRAS genes have been reported in tumors, with the incidence being high in CRC tumors (around 40%) and found mainly in the coding region of the P-loop coding protein (codon 12 and 13) [22]. The gene that codes for the KRAS protein is regulated in the 3 UTR promoter region by small non-coding RNAs approximately 20 nucleotides in length, called microRNAs (miRNA), which participate in different processes, such as proliferation, migration, invasion, and tumor development. In CRC, the participation of miRNAs in the regulation pathways can be dual, with them acting as tumor suppressors (miR-143) and oncogenes (miR-21) [23].
These are altered in cancer and have been associated with different gene variants [10]. The literature contains few association studies on the rs8720 variant; only one study, conducted in a Chinese population, has associated it with susceptibility to the risk of developing CRC [19].
Associating the variant with the TT genotype and the T allele, the study's results are contradictory to the findings observed in this study, where we observed that the CC genotype and the C allele of the rs8720 variant were associated with a risk of developing CRC (p < 0.05). This is the first study to report on the association of susceptibility with the risk of developing CRC in a Mexican population.
In the rs12587 variant, the data in the present study showed an association between the GG genotype and risk in CRC (p < 0.05). Only one study, which was carried out in the Chinese population with CRC and included 430 patients and the same number of controls, found no association with the rs12587 variant [13].
The importance of conducting studies in each population was evidenced by the findings observed when comparing the C (rs8720) and G (rs12587) allele variants of the KRAS gene in the control group of the Mexican population from this study with the control groups of other populations, with differences observed in Finnish, European, East Asian, and African American populations with both variants of the KRAS gene. Notably, the frequency of the T allele of both polymorphisms shows similar segregation in those populations that did not show significant differences when compared with the Mexican population. The exception is the population of Puerto Rico, where the segregation of the T allele shows an inverse behavior. This evidences this gene's genetic heterogeneity. Latin American populations are characterized by being mestizo; therefore they are populations with high genetic diversity [24]. However, more studies are needed to verify the inverted allelic frequencies observed in rs8720 and rs12587 variants among the Puerto Rican and Mexican populations; since the data was taken from a repository, a study cannot necessarily verify its frequency.
The association analysis of the clinical variables of the CRC patients with the KRAS gene variants showed that being a carrier of the CC and TC genotypes of the rs8720 variant was linked with rectal cancer, an advanced age over 50 years, progression, and moderately differentiated histology. Regarding the literature, there is only one study, which analyzed 1142 patients with CRC and the same number of controls from the Chinese population, where the authors observed an association between the T allele of the rs8720 variant in CRC patients and invasion beyond the serosa, suggesting that the T allele may be correlated with progression to CRC [19]. However, the results of the analysis carried out in this study show that the C allele was the most frequent in the Mexican population, which is why the risk of the C allele with the clinical pathological characteristics of the group of patients with CRC is evident.
Notably, the GG and GT genotypes of the rs12587 variant had statistical differences when compared for male gender and stages I-II and nonresponse to chemotherapy in stages III-IV. There is only one study in the literature where the association between the rs12587 variant and CRC in the Chinese population has been analyzed; however, the authors found no association [13]. Although no existing studies support these findings, these confounding factors show that this stratification is important due to its contributing to differences in the rs8720 and rs12587 variants and their associations with CRC risk.
It has been observed that the synergistic effect of RAS and the mTOR complex and the PI3K/AKT and ERK pathways has an important function in cell survival and aging. In fact, age is an important factor in the formation of cancer. It has been shown that, based on the time of diagnosis, CRC can be divided into two groups: early (before the age of 50) or late (after the age of 50) [25].
It is worth noting that, although CRC is a multifactorial entity and does not indicate that only one gene is responsible for originating metastasis, different studies have shown that the most common metastasis in CRC is in the liver and lung. Therefore, it has been shown that lung metastasis is more frequent than rectal cancer and overall survival is greater than that of patients with colon cancer. However, much remains to be known about the pathways involved in the development of metastasis of CRC tumor cells. Different RNAs of specific genes in circulating tumor cells in the bloodstream of CRC patients have been shown to be associated with cell motility, apoptosis, cell signaling and interaction, and their connection to neutrophils, all of which play important roles in the development of metastases [25].
Moreover, CRC patients with KRAS mutations have been shown to have an inferior response to most KRAS-targeted therapies, so the CRC consortium has suggested that a comprehensive understanding of molecular interactions in pathways is important for mutant CRC signaling in KRAS and that the integration of multi-omics data (genome, transcriptome, epigenome, metabolome, and immunome) can help to understand and propose the development of combination therapies with potential therapeutic use suitable for KRAS mutant CRC [7].
Although much remains unknown about the molecular mechanisms of the KRAS gene, it has recently been shown that variants in the 3 UTR region of the gene do not permit the binding of gene regulatory molecules, such as the microRNA-driven epigenetic regulation of KRAS expression, further increasing the complexity of the known KRAS biology. The heterogeneous group of miRNAs is composed of small, single-stranded non-coding RNA molecules. Currently, several miRNAs (let-7, miR-193, miR-143, miR-18a) have been identified that target the KRAS 3 UTR region, leading to KRAS mRNA degradation and/or repression of KRAS. Consistent with their function, KRAS-targeting miRNA levels were found to be decreased in CRC, highlighting miRNAs as potential diagnostic or prognostic biomarkers, either as single factors or in miRNA panels [7]. This can contribute to the development of cancer. Additionally, the 3 UTR of the KRAS gene helps regulate it by disrupting complementary sites, which promotes tumor progression. The variant alleles of rs8720 and rs12587 are probably located in the target sites of cell recognition and modulation and, consequently, may influence the imbalance of KRAS and the survival of cancer cells [7,11,14,25].
The KRAS variants analyzed in this study were not shown to be in linkage disequilibrium. In the study, in the groups analyzed, the frequent haplotype rs8720 and rs12587 observed to act as a protective factor against susceptibility to the development of CRC was TT (present in 20% of the controls and 11% of the patients with CRC). Unfortunately, no study on BC has analyzed this association. Thus, the combination of the two KRAS variants is important information that identifies the haplotypes that confer protection against developmental susceptibility to CRC.
On the other hand, the analysis of the combination of genotypes of the analyzed variants of the KRAS gene (rs8720 and rs12587) showed the CCGG combination as a risk factor for the susceptibility of developing CRC, indicating that, in patients with CRC, both risk alleles must be present. It should be noted that more population studies are necessary to demonstrate this association.

In Silico Analysis
Through the use of repositories, the differential expression of 13 miRNAs that interact with the rs8720 and rs12587 variants was demonstrated. It is worth mentioning that, for the 13 miRNAs associated with the rs8720 and rs12587 variants of the KRAS gene, their relationship with different pathologies has been analyzed via experimental studies in culture or animal models as well as in groups of patients. In this regard, hsa-miR-885-5p participates as a suppressor of the expression of this lipid receptor and sterol transporter, being associated with the regulation of fatty liver and lipoprotein metabolism [26]. A has-miR-497-3p study performed in an experimental model in rats with induced physiological left ventricular hypertrophy showed that the expression levels of miR-26b-5p, miR-204-5p, and miR-497-3p participated in autophagy regulation [26,27]. Another study has suggested that miR-497 and miR-1246 are possibly involved in the progression of hepatocarcinoma by regulating target genes [28]. has-miR-4328 can be considered a possible biomarker in acute promyelocytic leukemia [29]. In addition, it has also been found to participate in diabetic retinopathy [30]. has-miR-382-3p has been related to the regulation of spermatogenesis [31]. Meanwhile, has-miR-182-3p participates in the pathogenesis of pulmonary arterial hypertension vascular remodeling [32] as well as in the regulation of osteosarcoma through the EBF2 regulation pathway [33]. Furthermore, it was shown that the overexpression of has-miR-152-5p inhibits the progression of fibrosis in keloids [34]. Another study found this microRNA to be a potential biomarker for ST-segment elevation myocardial infarction [35]. It has been suggested that has-miR-11399 regulates interleukin 6 (IL-6), associating it with vascular events, stress, and depression [36]. On the other hand, there is a lack of studies on other miRNAs (e.g., has-miR-6512-5p, has-miR-597-3p, has-miR-5680, has-miR-551b-5p, has-miR-506-5p, has-miR-2117).
The miRNAs identified in the present study with the use of bioinformatics tools propose new lines of research in the KRAS gene in CRC and in other types of pathologies. It should be noted that it was found that hsa-miR-4328 miRNA regulate genes involved in various cell signaling pathways, participate in the regulation of EFGR, RET/PTC, KRAS, C-JUN, and AP1/SP1 in the MAPK pathway and the ET1 and PTEN genes that participate in the PI3K-AKT signaling pathway. However, the in silico study did not show differences between mRNA expression levels and survival modification between colon and rectal cancer. Although a high expression was observed in them, it has not been ruled out that these miRNAs may be involved in the regulation of the gene, since the in silico analysis showed that at least 13 miRNAs showed an association with the alleles of the rs8729 and rs12587 variants analyzed in the present study. New studies are recommended to validate this information.

Conclusions
Our results showed that the CC genotype, C allele, and dominant (TCCC) of the rs8720 variant were associated with a risk for CRC when the controls and patients were compared. Furthermore, differences were observed in the patients with CRC stratified by CC and TC genotype of the rs8720 variant and the presence of rectal cancer, stage I-II, or rectal type to age greater than 50 years old, with moderately differentiated tumor and with stages III-IV. The rs12587 variant was also a risk factor for CRC patient group carriers of the GG genotypes when the controls and patients were compared. Differences were observed in the patients with CRC stratified by GG and TG, and stage I-II, male and stage I-II, and non-chemotherapy response to stage III-IV. The presence of TG haplotypes is an associated protective susceptibility factor in CRC. The identification of 13 miRNAs interacting with the variants analyzed in silico as well as different signaling pathways is important. More studies are needed to confirm the observed findings. Informed Consent Statement: Written consent was obtained from all participants prior to their involvement in the study.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request. Data available on request due to privacy/ethical restrictions.